NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A FAIR Resource Recommender System for Smart Open Scientific Inquiries

https://doi.org/10.3390/app15158334

Sakib, Syed N; Rubaiat, Sajratul Y; Naha, Kallol; Rahman, Hasan H; Jamil, Hasan M (July 2025, Applied Sciences)

A vast proportion of scientific data remains locked behind dynamic web interfaces, often called the deep web—inaccessible to conventional search engines and standard crawlers. This gap between data availability and machine usability hampers the goals of open science and automation. While registries like FAIRsharing offer structured metadata describing data standards, repositories, and policies aligned with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, they do not enable seamless, programmatic access to the underlying datasets. We present FAIRFind, a system designed to bridge this accessibility gap. FAIRFind autonomously discovers, interprets, and operationalizes access paths to biological databases on the deep web, regardless of their FAIR compliance. Central to our approach is the Deep Web Communication Protocol (DWCP), a resource description language that represents web forms, HyperText Markup Language (HTML) tables, and file-based data interfaces in a machine-actionable format. Leveraging large language models (LLMs), FAIRFind combines a specialized deep web crawler and web-form comprehension engine to transform passive web metadata into executable workflows. By indexing and embedding these workflows, FAIRFind enables natural language querying over diverse biological data sources and returns structured, source-resolved results. Evaluation across multiple open-source LLMs and database types demonstrates over 90% success in structured data extraction and high semantic retrieval accuracy. FAIRFind advances existing registries by turning linked resources from static references into actionable endpoints, laying a foundation for intelligent, autonomous data discovery across scientific domains.
more » « less
Free, publicly-accessible full text available July 26, 2026
Potency of Latent Spaces in Inverse Quantum Dye Design

https://doi.org/10.1145/3733723.3733742

Rahman, Hasan H; Flores, Jonathan; Spear, Lawrence; Li, Lan; Jamil, Hasan M (June 2025, ACM)

The discovery of functional dye materials with superior optical properties is crucial for advancing technologies in biomedical imaging, organic photovoltaics, and quantum information systems. Recent advancements highlight the need to accelerate this discovery process by integrating computational strategies with experimental methods. In this regard, we have employed a computational approach to explore the latent space of dye materials, utilizing swarm optimization techniques to efficiently navigate complex chemical spaces and identify optimal values of molecular properties using machine learning methods based on target properties, such as high extinction coefficients ($$\varepsilon$$). The latent space based evaluation outperformed all available features of a domain. This approach enhances inverse material design by systematically correlating molecular parameters with desired optical characteristics by implementing VAEs. In this process, by defining target properties as inputs, the model effectively determines the key molecular features necessary for engineering high-performance dye compounds.
more » « less
Free, publicly-accessible full text available June 23, 2026
Implementing a Declarative Query Language for High Level Machine Learning Application Design

https://doi.org/10.2139/ssrn.5210204

Rahman, Hasan; Jamil, Hasan (April 2025, SSRN)

The rising popularity of data science and machine learning (ML) across diverse domains, often driven by users with limited computational expertise, reflects the growing commoditization of ML tools. However, the advanced technical and mathematical knowledge demanded by current ML frameworks poses a formidable barrier for non-experts, preventing them from fully exploiting these powerful platforms.In response, we introduce MQL, a novel declarative query language for ML application design, alongside its corresponding query processing engine. We demonstrate that abstracting ML concepts -- similarly to SQL -- can preserve both processing efficiency and analytical fidelity. Our implementation defines MQL semantics through a semantics-preserving mapping to widely understood ML code fragments. By leveraging task-specific meta-features, heuristic knowledge, and standard assessment methods, our system ranks candidate ML libraries, selects optimal algorithms, and frees users from these choices.We introduce mapping algorithms to ensure that each MQL program retains its intended semantics and present experimental evaluations demonstrating that MQL’s algorithmic selections not only match but surpass human-engineered solutions in terms of performance and model accuracy. By offering declarative queries as a high-level alternative to traditional coding, MQL significantly reduces the complexity of data analysis pipeline construction, thereby democratizing machine learning application design. To foster shared community development, this work is maintained as an open-source project at \url{https://github.com/hmjamil/mql}.
more » « less
Free, publicly-accessible full text available April 8, 2026

Search for: All records